Dimension reduction and shrinkage methods for high dimensional disease risk scores in historical data.

نویسندگان

  • Hiraku Kumamaru
  • Sebastian Schneeweiss
  • Robert J Glynn
  • Soko Setoguchi
  • Joshua J Gagne
چکیده

BACKGROUND Multivariable confounder adjustment in comparative studies of newly marketed drugs can be limited by small numbers of exposed patients and even fewer outcomes. Disease risk scores (DRSs) developed in historical comparator drug users before the new drug entered the market may improve adjustment. However, in a high dimensional data setting, empirical selection of hundreds of potential confounders and modeling of DRS even in the historical cohort can lead to over-fitting and reduced predictive performance in the study cohort. We propose the use of combinations of dimension reduction and shrinkage methods to overcome this problem, and compared the performances of these modeling strategies for implementing high dimensional (hd) DRSs from historical data in two empirical study examples of newly marketed drugs versus comparator drugs after the new drugs' market entry-dabigatran versus warfarin for the outcome of major hemorrhagic events and cyclooxygenase-2 inhibitor (coxibs) versus nonselective non-steroidal anti-inflammatory drugs (nsNSAIDs) for gastrointestinal bleeds. RESULTS Historical hdDRSs that included predefined and empirical outcome predictors with dimension reduction (principal component analysis; PCA) and shrinkage (lasso and ridge regression) approaches had higher c-statistics (0.66 for the PCA model, 0.64 for the PCA + ridge and 0.65 for the PCA + lasso models in the warfarin users) than an unreduced model (c-statistic, 0.54) in the dabigatran example. The odds ratio (OR) from PCA + lasso hdDRS-stratification [OR, 0.64; 95 % confidence interval (CI) 0.46-0.90] was closer to the benchmark estimate (0.93) from a randomized trial than the model without empirical predictors (OR, 0.58; 95 % CI 0.41-0.81). In the coxibs example, c-statistics of the hdDRSs in the nsNSAID initiators were 0.66 for the PCA model, 0.67 for the PCA + ridge model, and 0.67 for the PCA + lasso model; these were higher than for the unreduced model (c-statistic, 0.45), and comparable to the demographics + risk score model (c-statistic, 0.67). CONCLUSIONS hdDRSs using historical data with dimension reduction and shrinkage was feasible, and improved confounding adjustment in two studies of newly marketed medications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

بهبود مدل تفکیک‌کننده منیفلدهای غیرخطی به‌منظور بازشناسی چهره با یک تصویر از هر فرد

Manifold learning is a dimension reduction method for extracting nonlinear structures of high-dimensional data. Many methods have been introduced for this purpose. Most of these methods usually extract a global manifold for data. However, in many real-world problems, there is not only one global manifold, but also additional information about the objects is shared by a large number of manifolds...

متن کامل

Supervised Feature Extraction of Face Images for Improvement of Recognition Accuracy

Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...

متن کامل

Analysis of Censored Survival Data with Dimension Reduction Methods‎: Tehran Lipid and Glucose Study

 ‎Cardiovascular diseases (CVDs) are the leading cause of death worldwide‎. ‎To specify an appropriate model to determine the risk of CVD and predict survival rate‎, ‎users are required to specify a functional form which relates the outcome variables to the input ones‎. ‎In this paper‎, ‎we proposed a dimension reduction method using a general model‎, ‎which includes many widely used survival m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Emerging themes in epidemiology

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2016